131 research outputs found

    A Study in function optimization with the breeder genetic algorithm

    Get PDF
    Optimization is concerned with the finding of global optima (hence the name) of problems that can be cast in the form of a function of several variables and constraints thereof. Among the searching methods, {em Evolutionary Algorithms} have been shown to be adaptable and general tools that have often outperformed traditional {em ad hoc} methods. The {em Breeder Genetic Algorithm} (BGA) combines a direct representation with a nice conceptual simplicity. This work contains a general description of the algorithm and a detailed study on a collection of function optimization tasks. The results show that the BGA is a powerful and reliable searching algorithm. The main discussion concerns the choice of genetic operators and their parameters, among which the family of Extended Intermediate Recombination (EIR) is shown to stand out. In addition, a simple method to dynamically adjust the operator is outlined and found to greatly improve on the already excellent overall performance of the algorithm.Postprint (published version

    Instance and feature weighted k-nearest-neighbors algorithm

    Get PDF
    We present a novel method that aims at providing a more stable selection of feature subsets when variations in the training process occur. This is accomplished by using an instance-weighting process -assigning different importances to instances as a preprocessing step to a feature weighting method that is independent of the learner, and then making good use of both sets of computed weigths in a standard Nearest-Neighbours classifier. We report extensive experimentation in well-known benchmarking datasets as well as some challenging microarray gene expression problems. Our results show increases in stability for most subset sizes and most problems, without compromising prediction accuracy.Peer ReviewedPostprint (published version

    Heterogeneous Kohonen networks

    Get PDF
    A large number of practical problems involves elements that are described as a mixture of qualitative and quantitative infomation, and whose description is probably incomplete. The self-organizing map is an effective tool for visualization of high-dimensional continuous data. In this work, we extend the network and training algorithm to cope with heterogeneous information, as well as missing values. The classification performance on a collection of benchmarking data sets is compared in different configurations. Various visualization methods are suggested to aid users interpret post-training results.Peer ReviewedPostprint (author's final draft

    Exploiting the accumulated evidence for gene selection in microarray gene expression data

    Get PDF
    Machine Learning methods have of late made signicant efforts to solving multidisciplinary problems in the field of cancer classification using microarray gene expression data. Feature subset selection methods can play an important role in the modeling process, since these tasks are characterized by a large number of features and a few observations, making the modeling a non-trivial undertaking. In this particular scenario, it is extremely important to select genes by taking into account the possible interactions with other gene subsets. This paper shows that, by accumulating the evidence in favour (or against) each gene along the search process, the obtained gene subsets may constitute better solutions, either in terms of predictive accuracy or gene size, or in both. The proposed technique is extremely simple and applicable at a negligible overhead in cost.Postprint (published version

    Similarity networks for classification: a case study in the Horse Colic problem

    Get PDF
    This paper develops a two-layer neural network in which the neuron model computes a user-defined similarity function between inputs and weights. The neuron transfer function is formed by composition of an adapted logistic function with the mean of the partial input-weight similarities. The resulting neuron model is capable of dealing directly with variables of potentially different nature (continuous, fuzzy, ordinal, categorical). There is also provision for missing values. The network is trained using a two-stage procedure very similar to that used to train a radial basis function (RBF) neural network. The network is compared to two types of RBF networks in a non-trivial dataset: the Horse Colic problem, taken as a case study and analyzed in detail.Postprint (published version

    Developments in kernel design

    Get PDF
    The aim of this paper is to give a concise overview of kernels, with a special attention to non-standard or heterogeneous data sources (e.g. non-numerical or structured data). A second goal is to discuss the world of possibilities that kernel design opens for the principled analysis of special or new application domains. The reader is referred to some of the excellent survey publications -as [1, 2, 3]- for an in-depth coverage.Postprint (published version

    Learning in networks of similarity processing neurons

    Get PDF
    Similarity functions are a very flexible container under which to express knowledge about a problem as well as to capture the meaningful relations in input space. In this paper we describe ongoing research using similarity functions to find more convenient representations for a problem –a crucial factor for successful learning– such that subsequent processing can be delivered to linear or non-linear modeling methods. The idea is tested in a set of challenging problems, characterized by a mixture of data types and different amounts of missing values. We report a series of experiments testing the idea against two more traditional approaches, one ignoring the knowledge about the dataset and another using this knowledge to pre-process it. The preliminary results demonstrate competitive or better generalization performance than that found in the literature. In addition, there is a considerable enhancement in the interpretability of the obtained models.Postprint (published version

    Similarity-based heterogeneous neuron models

    Get PDF
    This paper introduces a general class of neuron models, accepting heterogeneous inputs in the form of mixtures of continuous (crisp or fuzzy) numbers, linguistic information, and discrete (either ordinal or nominal) quantities, with provision also for missing information. Their internal stimulation is based on an explicit similarity relation between the input and weight tuples (which are also heterogeneous). The framework is comprehensive and several models can be derived as instances --in particular, two of the commonly used models are shown to compute a specific similarity function provided all inputs are real-valued and complete. An example family of models defined by composition of a Gower-based similarity with a sigmoid function is shown to lead to network designs (Heterogeneous Neural Networks) capable of learning from non-trivial data sets with a remarkable effectiveness, comparable to that of classical models.Peer ReviewedPostprint (author's final draft

    On aggregation operators of transitive similarity and dissimilarity relations

    Get PDF
    Similarity and dissimilarity are widely used concepts. One of the most studied matters is their combination or aggregation. However, transitivity property is often ignored when aggregating despite being a highly important property, studied by many authors but from different points of view. We collect here some results in preserving transitivity when aggregating, intending to clarify the relationship between aggregation and transitivity and making it useful to design aggregation operators that keep transitivity property. Some examples of the utility of the results are also shown.Peer ReviewedPostprint (published version

    Bayesian semi non-negative matrix factorisation

    Get PDF
    Non-negative Matrix Factorisation (NMF) has become a standard method for source identification when data, sources and mixing coefficients are constrained to be positive-valued. The method has recently been extended to allow for negative-valued data and sources in the form of Semi-and Convex-NMF. In this paper, we re-elaborate Semi-NMF within a full Bayesian framework. This provides solid foundations for parameter estimation and, importantly, a principled method to address the problem of choosing the most adequate number of sources to describe the observed data. The proposed Bayesian Semi-NMF is preliminarily evaluated here in a real neuro-oncology problem.Peer ReviewedPostprint (published version
    • …
    corecore